Remark
Please be aware that these lecture notes are accessible online in an ‘early access’ format. They are actively being developed, and certain sections will be further enriched to provide a comprehensive understanding of the subject matter.
1.4. Understanding Geospatial Data#
Geospatial data, a key component of spatial analysis and geographic sciences, refers to information that is linked directly or indirectly to a specific location or geographical area. This section delves into the characteristics of geospatial data, its various forms, and its significance in spatial analysis.
The spatial dimension is a critical aspect of data analysis, providing a geographical perspective that transforms raw data into insightful visualizations. It enables the detection of patterns, such as migration flows, urban development, and environmental shifts. Spatial analysis also reveals relationships between data points, highlighting proximity and clustering effects. This analysis is crucial for resource allocation, urban planning, and emergency management, offering clear insights for informed decision-making. Furthermore, visualizing spatial data supports strategic policy development and effective governance, ensuring that decisions are grounded in tangible, spatially-referenced evidence.
1.4.1. Types of Geospatial Data#
In the context of a Geographic Information System (GIS), real-world observations—which include any objects or events that are measurable in two or three dimensions—must be translated into simplified spatial representations. This process involves distilling complex, real-world details into fundamental spatial constructs that can be effectively managed and analyzed within a GIS framework. These constructs are then modeled in one of two ways:
Vector Data Model: This approach captures the geometry and location of spatial entities using points, lines, and polygons. It is adept at representing discrete features with clear boundaries and precise locations, such as buildings, roads, or administrative borders.
Fig. 1.15 An example of vector data.#
Raster Data Model: In this model, the spatial entities are depicted as a uniform grid of cells, with each cell holding a value to represent a particular attribute of that area, such as elevation or temperature. It is suited for continuous data that doesn’t have distinct boundaries, like rainfall distribution or land surface temperatures.
Fig. 1.16 An example of raster data.#
Both are fundamental to GIS and are chosen based on the nature of the data and the specific requirements of the analysis being performed.
1.4.1.1. Vector Data#
Vector data is a way of representing real-world features within the context of spatial analysis and geographic information systems (GIS). Here’s a breakdown of its components:
Points: The most basic form of vector data, points are used to represent discrete locations on the earth’s surface. Each point is defined by a pair of coordinates (latitude and longitude) and can symbolize locations like cities, wells, or trees.
Example: The dataset from the Calgary Public Library contains information about library locations and their hours of operation. Here we only represent libraries and their locations.
Show code cell source
import geopandas as gpd
import folium
from shapely.geometry import Point
import numpy as np
# Read the data file
gdf = gpd.read_file('../data/Calgary_Public_Library_Locations_and_Hours_20240620.csv')
# Function to create Point objects from coordinates
def create_point(loc):
lon, lat = map(float, loc.strip('()').split(', '))
return Point(lat, lon)
# Convert the 'Location' column to Point objects with correct coordinates
gdf['Location'] = gdf['Location'].apply(create_point)
# Extract coordinates into NumPy array
coords = np.array([(point.x, point.y) for point in gdf['Location']])
# Calculate the median of all points
median_coords = np.median(coords, axis=0)
# Get the median coordinates
median_lat, median_lon = median_coords
# Create a map centered around the median point
m = folium.Map(location=[median_lon, median_lat], zoom_start=10, tiles="cartodbpositron",
control_scale = True)
# Add markers to the map
icon_size = (8, 8) # Define the icon size (width, height)
for _, row in gdf.iterrows():
folium.Marker(
location=[row['Location'].y, row['Location'].x],
popup=row['Library'],
icon=folium.CustomIcon(
icon_image='../data/simple_icon.png', # Replace with the path to your icon image
icon_size=icon_size,
icon_anchor=(0, 0), # Adjust anchor to the center of the icon
popup_anchor=(0, 0) # Adjust popup anchor to appear above the icon
)
).add_to(m)
# Display the map
display(m)
Note - Map Scale
The scale in a GIS context is the ratio of a distance on the map to the actual distance on the ground. A large-scale map shows a larger ratio, meaning that map features are relatively large. This type of map covers a smaller area but with greater detail. For instance, a scale might be represented as 1:5,000 where 1 unit on the map equals 5,000 units in reality.
On these Folium maps, the scale is indicated in both kilometers and miles for convenience, such as 5 km or 5 mi, aiding in quick estimation of distances.
Lines: Lines, or polylines, are sequences of points connected by straight segments that represent linear features such as rivers, roads, or utility lines. They are crucial for mapping routes and connections between different points.
Example: The following map displays the LRT tracks for the city of Calgary.
Show code cell source
import geopandas as gpd
import folium
import pandas as pd
from shapely import wkt
# Read the CSV file into a DataFrame
df = pd.read_csv('../data/Tracks_-_LRT_20240622.csv')
# Convert the 'the_geom' column to LineString geometries
df['geometry'] = df['the_geom'].apply(wkt.loads)
# Filter the DataFrame to include only rows where RAIL_TYPE is 'LRT'
df_lrt = df[df['RAIL_TYPE'] == 'LRT']
# Create a GeoDataFrame
gdf_lrt = gpd.GeoDataFrame(df_lrt, geometry='geometry')
# Calculate the centroid of all geometries for the initial map center
avg_lat = gdf_lrt.geometry.centroid.y.mean()
avg_lon = gdf_lrt.geometry.centroid.x.mean()
# Create a folium map centered around the average coordinates
m = folium.Map(location=[avg_lat, avg_lon], zoom_start=10, tiles="cartodbpositron", control_scale=True)
# Add LineString geometries to the map with color based on 'LRT'
for _, row in gdf_lrt.iterrows():
folium.PolyLine(
locations=[(coord[1], coord[0]) for coord in row['geometry'].coords],
color='green', # Set the color for 'LRT'
weight=2
).add_to(m)
# Display the map
display(m)
Polygons: Polygons are closed shapes formed by connecting multiple line segments end-to-end. They are used to represent areas like lakes, park boundaries, or property lots. Polygons can be complex, with attributes like area, perimeter, and centroid.
Example: The following data is a representation of the City of Calgary’s boundary in a MULTIPOLYGON format.
Show code cell source
import folium
import geopandas as gpd
from IPython.display import display
# Load the GeoJSON file
gdf = gpd.read_file('../data/City Boundary_20240620.gpkg')
# Get the center of the map
x, y = gdf.geometry.unary_union.centroid.x, gdf.geometry.unary_union.centroid.y
# Create a folium map centered around the centroid of the GeoJSON
m = folium.Map(location=[y, x], zoom_start= 9, tiles="cartodbpositron",
control_scale = True)
# Add the GeoJSON to the map
folium.GeoJson(gdf).add_to(m)
# Display the map
display(m)
Vector data is particularly valuable in applications that require high precision and detail. For example:
Cadastral Mapping: This involves creating maps that show property boundaries and land ownership. Precision is key here, as legal implications are involved.
Navigation Systems: GPS and other navigation tools use vector data to provide accurate turn-by-turn directions and route planning.
1.4.1.2. Raster Data#
Raster data is a type of geospatial data representation that uses a matrix of cells, commonly referred to as pixels, to model the Earth’s surface and various phenomena. This method is particularly effective for capturing and conveying information that changes continuously over space, such as elevation, temperature, or land cover.
Grid of Pixels: Imagine a raster as a digital canvas where each pixel is a square paint dab. Each dab (pixel) carries specific information about that tiny square of the real world.
Pixel Values: The value of each pixel can represent different types of data. For example, in a temperature map, the pixel value might indicate the temperature at that location; in a digital elevation model, it would represent the height above sea level.
Example: Imagine a 2D array filled with random integers ranging from 0 to 255. By applying a colormap to a 2D plot of this array, we can create a visual representation. This method is akin to how we visualize diverse datasets, including elevation and land surface temperatures, to extract meaningful patterns from numerical values.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
# Set the random seed for reproducibility
np.random.seed(0)
# Generate a random 10x10 array with values between 0 and 255
X = np.random.randint(256, size=(10, 10))
# Create the figure and axis objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Display the array as an image with the 'Spectral' colormap
im = ax.imshow(X, cmap='Spectral')
# Set the aspect ratio of the axis to be equal
ax.set_aspect('equal')
# Add a colorbar to the figure with specified fraction and padding
cbar = fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('Color Intensity', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Add a title to the plot
ax.set_title("Plot 2D Array with Spectral Colormap", fontsize=15)
# Disable the grid lines
ax.grid(False)
# Adjust layout to ensure everything fits without overlap
plt.tight_layout()
Some common uses of raster data include:
Satellite Imagery: These images are composed of raster data where each pixel corresponds to a specific area on the Earth’s surface, capturing details like land cover, land surface temperature, etc.
Elevation Models: Digital Elevation Models (DEMs) use raster data to represent the terrain. Each pixel’s value indicates the elevation at that specific point, which is essential for flood modeling, land use planning, and even 3D visualization.
Example: The goal of this example aligns with the principles discussed in the previous example about the spatial dimension. Just as we use spatial analysis to visualize and understand complex datasets, the code demonstrates this process in action. It uses Earth Engine and geemap to create visual representations of elevation and water occurrence, similar to how we might visualize land surface temperatures or other environmental data. The example underscores the power of geospatial tools to transform numerical data into comprehensible, visual formats, aiding in the analysis and decision-making processes that were highlighted earlier. Essentially, it’s a practical application of the spatial dimension’s capabilities in real-world scenarios.
Show code cell source
import ee
import geemap
import geemap.colormaps as cm
# Authenticate and initialize Earth Engine
ee.Authenticate()
ee.Initialize()
# Create a map centered on Calgary
Map = geemap.Map(center=[51.0447, -114.0719], zoom=10)
# Set the basemap to 'Esri National Geographic'
# Map.add_basemap('USGS 3DEP Elevation')
# Add an elevation layer
dem = ee.Image('CGIAR/SRTM90_V4')
elevation = dem.select('elevation')
vis_params = {'min': 0, 'max': 4000, 'palette': cm.palettes.dem}
Map.addLayer(elevation, vis_params, 'SRTM DEM (Version 4)')
Map.add_colorbar(vis_params, label="Elevation (m)", layer_name="SRTM DEM (Version 4)")
# Add a water layer to visualize streams
water = ee.Image('JRC/GSW1_3/GlobalSurfaceWater')
occurrence = water.select('occurrence')
Map.addLayer(occurrence.updateMask(occurrence.gt(0)), {'palette': "blue"},
'JRC Global Surface Water (v1.4)')
# Display the map
display(Map)
1.4.2. Vector vs. Raster Data Models#
Vector and raster data models are fundamental in GIS for representing spatial information. Each has unique characteristics that make them suitable for different types of spatial analysis {cite:p}
atlas_raster_2024, gisgeography_vector_2015.
1.4.2.1. Vector Data Advantages#
Vector data’s structure allows for complex analyses and representations of the real world, from the precision of property lines to the connectivity of road networks.
Precision and Accuracy: Vector data can represent boundaries and features with a high degree of accuracy, which is essential for detailed mapping and analysis.
Scalability: Unlike raster data, vector data can be scaled up or down without losing quality. This makes vector data ideal for applications that require zooming in and out.
Efficient Storage: For many types of geographical data, vector formats require less storage space, especially when representing sparse data.
Topology: Vector data helps to describe the entire topology, allowing for the representation of not just the location but also the relationships between different spatial features.
1.4.2.2. Vector Data Disadvantages#
Despite its advantages, vector data can present challenges, particularly when dealing with large datasets or complex spatial relationships.
Complex Data Structure: Vector data can be complex to manage due to the relationships between points, lines, and polygons. This complexity can make data management and analysis more challenging.
Computational Intensity: Certain spatial operations on vector data, such as network analysis or overlay analysis, can be computationally intensive and time-consuming.
1.4.2.3. Raster Data Advantages#
Raster data’s simplicity and suitability for certain types of analysis make it a valuable tool in the GIS toolkit, especially when dealing with large, continuous datasets.
Simplicity: Raster data is conceptually simpler and easier to work with, making it accessible to a wide range of users.
Suitable for Continuous Data: Raster is ideal for representing continuous data, such as elevation or temperature gradients, where the phenomenon is measured across the landscape.
Fast Analysis: For certain types of spatial analysis, raster data can be processed quickly due to its regular grid structure.
1.4.2.4. Raster Data Disadvantages#
Raster data’s reliance on resolution can be a limiting factor, affecting everything from the accuracy of feature representation to the size of the data files.
Resolution Dependency: The level of detail in raster data is tied to the resolution of the pixels. Higher resolution means more detail but also larger file sizes.
Spatial Inaccuracies: The limits imposed by raster cell dimensions can lead to spatial inaccuracies, especially when representing small or narrow features.
Both vector and raster data models have their place in GIS and are often used complementarily. Vector data is typically used for precise mapping and detailed analysis of discrete features, while raster data is used for modeling and analyzing continuous phenomena. The choice between vector and raster data depends on the specific requirements of the project, the nature of the spatial data, and the type of analysis to be performed.
1.4.3. Attribute Tables#
In Geographic Information Systems (GIS), attribute tables are essential components that store non-spatial data linked to spatial features. Each spatial feature on a map, such as a building, road, or land parcel, corresponds to a record in the attribute table. This record is connected to the feature through a unique numerical identifier known as a Feature Identifier (FID). For example, a park (spatial feature) on a GIS map may have an FID of 102, and its corresponding record in the attribute table could include attributes like area, vegetation type, and usage regulations.
Example: Let’s take a look at the attribute tables for the dataset from the Calgary Community Boundaries. The example shows a snippet of an attribute table for the Calgary Community Boundaries dataset. It illustrates how each spatial feature, like a park or residential area, is associated with a record in the table, identified by a unique Feature Identifier (FID). The table includes various attributes such as class, class code, community code, name, sector, and more, which describe the non-spatial characteristics of the spatial features. The purpose is to show how GIS integrates spatial data (like MULTIPOLYGON geometries) with descriptive information, enabling detailed analysis and decision-making. It highlights the importance of attribute tables in managing and utilizing geospatial data effectively.
Show code cell source
import geopandas as gpd
import folium
from shapely import wkt
from shapely.geometry import MultiPolygon
# Read the data file
gdf = gpd.read_file('../data/Community_District_Boundaries_20240620.csv')
# Ensure the coordinate reference system is set to 'epsg:4326'
gdf.crs = 'epsg:4326'
# Convert 'MULTIPOLYGON' column to geometry
gdf['geometry'] = gdf['MULTIPOLYGON'].apply(wkt.loads)
display(gdf)
| CLASS | CLASS_CODE | COMM_CODE | NAME | SECTOR | SRG | COMM_STRUCTURE | CREATED_DT | MODIFIED_DT | MULTIPOLYGON | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Residential | 1 | LEB | LEWISBURG | NORTH | DEVELOPING | BUILDING OUT | 2016/12/21 | 2019/11/26 | MULTIPOLYGON (((-114.0480237 51.1749865, -114.... | MULTIPOLYGON (((-114.04802 51.17499, -114.0471... |
| 1 | Residential | 1 | CSC | CITYSCAPE | NORTHEAST | DEVELOPING | BUILDING OUT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9524996 51.1543075, -113.... | MULTIPOLYGON (((-113.95250 51.15431, -113.9700... |
| 2 | Industrial | 2 | ST1 | STONEY 1 | NORTH | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-114.0133015 51.1744266, -114.... | MULTIPOLYGON (((-114.01330 51.17443, -114.0147... |
| 3 | Residential | 1 | MRT | MARTINDALE | NORTHEAST | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2020/10/22 | MULTIPOLYGON (((-113.9648991 51.1251901, -113.... | MULTIPOLYGON (((-113.96490 51.12519, -113.9684... |
| 4 | Industrial | 2 | ST2 | STONEY 2 | NORTHEAST | N/A | EMPLOYMENT | 2016/12/21 | 2016/12/21 | MULTIPOLYGON (((-113.9939281 51.153327, -113.9... | MULTIPOLYGON (((-113.99393 51.15333, -113.9939... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 308 | Residential | 1 | DRN | DEER RUN | SOUTH | ESTABLISHED | 1980s/1990s | 2016/12/21 | 2024/04/15 | MULTIPOLYGON (((-114.0118593 50.9381207, -114.... | MULTIPOLYGON (((-114.01186 50.93812, -114.0118... |
| 309 | Major Park | 3 | FPK | FISH CREEK PARK | PARKS | 2024/04/02 | 2024/04/15 | MULTIPOLYGON (((-114.1109815 50.9214266, -114.... | MULTIPOLYGON (((-114.11098 50.92143, -114.1109... | ||
| 310 | Residual Sub Area | 4 | 02L | 02L | OTHER | 2016/12/21 | 2024/05/13 | MULTIPOLYGON (((-114.0945798 51.2123357, -114.... | MULTIPOLYGON (((-114.09458 51.21234, -114.0947... | ||
| 311 | Residential | 1 | ABR | AMBLERIDGE | NORTH | DEVELOPING | BUILDING OUT | 2024/05/13 | 2024/05/13 | MULTIPOLYGON (((-114.1295323 51.1977901, -114.... | MULTIPOLYGON (((-114.12953 51.19779, -114.1413... |
| 312 | Residential | 1 | GLR | GLACIER RIDGE | NORTH | DEVELOPING | BUILDING OUT | 2020/06/01 | 2024/05/13 | MULTIPOLYGON (((-114.1679438 51.196922, -114.1... | MULTIPOLYGON (((-114.16794 51.19692, -114.1679... |
313 rows × 11 columns
Show code cell source
# Recreate the GeoDataFrame with the new geometry column
gdf = gpd.GeoDataFrame(gdf, geometry='geometry', crs='epsg:4326')
# Define a color map for different classes
color_map = {
'Residential': 'red',
'Industrial': 'blue',
'Major Park': 'green',
'Residual Sub Area': 'purple',
# Add more classes and colors as needed
}
# Initialize the folium map centered around the mean coordinates of the geometries
m = folium.Map(location=[gdf.geometry.centroid.y.mean(), gdf.geometry.centroid.x.mean()],
zoom_start=9, tiles="cartodbpositron", control_scale=True)
# Add each geometry to the map with a different color based on its class
for _, row in gdf.iterrows():
folium.GeoJson(
row['geometry'],
style_function=lambda feature, color=color_map.get(row['CLASS'], 'black'): {
'fillColor': color,
'color': color,
'weight': 2,
'fillOpacity': 0.6
}
).add_to(m)
# Create a legend HTML
legend_html = '''
<div style="position: fixed;
bottom: 50px; left: 50px; width: 150px; height: 150px;
border:2px solid grey; z-index:9999; font-size:14px;
background-color:white;
">
<b> Legend </b><br>
<i class="fa fa-square" style="color:red"></i> Residential <br>
<i class="fa fa-square" style="color:blue"></i> Industrial <br>
<i class="fa fa-square" style="color:green"></i> Major Park <br>
<i class="fa fa-square" style="color:purple"></i> Residual Sub Area <br>
<!-- Add more classes here if needed -->
</div>
'''
# Add the legend to the map
m.get_root().html.add_child(folium.Element(legend_html))
display(m)
1.4.4. Raster Data Attributes#
Raster data play a crucial role in Geographic Information Systems (GIS), where they serve as a fundamental means of representing spatial information through the values assigned to each pixel. These pixels are not just mere placeholders of spatial data; they can be categorized using unique integer values, which allows them to be linked to a set of attributes. This categorization is especially significant in land cover datasets, where different environmental features such as water bodies, forests, and urban areas are denoted by these pixel values. Each category is meticulously described in an attribute table, which includes detailed characteristics like the quality of water, the density of forests, or the regulations governing urban zones.
Example - Visualizing Raster Data with Heatmaps: To illustrate this concept, imagine a 10x10 matrix that represents a raster. This hypothetical raster data can be depicted through a heatmap, where a colorbar indicates the value of each cell by assigning specific colors. This technique is commonly applied in various types of raster data visualization, such as land surface temperature, elevation, and more, to effectively convey differences in values.
Show code cell source
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Set the random seed for reproducibility
np.random.seed(0)
# Generate a random 10x10 array with values between 0 and 255
X = np.random.randint(256, size=(10, 10))
# Create the figure and axes objects with a specified size
fig, ax = plt.subplots(figsize=(6, 6))
# Create a heatmap using seaborn with annotated values and a specified color map
sns.heatmap(X, annot=True, fmt="d", cmap='Spectral', cbar_kws={'label': 'Color Intensity', 'fraction': 0.046}, ax=ax)
# Customize the colorbar
cbar = ax.collections[0].colorbar
cbar.set_label('Color Intensity', rotation=270, labelpad=20, fontsize=16)
cbar.ax.tick_params(labelsize=16)
# Add a title to the heatmap
ax.set_title("Heatmap with Annotated Pixels", fontsize=15)
# Disable grid lines
ax.grid(False)
# Ensure the heatmap cells are square and adjust the layout
ax.set_aspect('equal')
plt.tight_layout()
While the heatmap offers a clear visualization of the data, assigning discrete values to each pixel that could represent different land cover categories, it’s important to remember that not all raster data formats are compatible with attribute tables. In many GIS applications, raster data are utilized without the accompaniment of attribute tables, relying solely on the inherent pixel values to convey the necessary spatial information. This approach underscores the versatility and adaptability of raster data in various GIS applications, despite the potential limitations posed by the absence of attribute tables in certain data formats.
Example:
To demonstrate a practical application of raster data visualization, let’s take the MODIS Land Cover Type Product (MCD12Q1 - v061) as an example. By employing Python and Geemap, we can generate a map and overlay a layer to display the land cover data. This is achieved using a predefined color palette that aligns with the International Geosphere-Biosphere Programme (IGBP) land cover classification system.
Show code cell source
import geemap
import ee
# Initialize the Earth Engine module.
ee.Initialize()
# Create an interactive map.
Map = geemap.Map(center=[51.0447, -114.0719], zoom= 5)
# Set the visualization parameters.
igbpLandCoverVis = {
"min": 1.0,
"max": 17.0,
"palette": [
"05450a", "086a10", "54a708", "78d203", "009900", "c6b044",
"dcd159", "dade48", "fbff13", "b6ff05", "27ff87", "c24f44",
"a5a5a5", "ff6d4c", "69fff8", "f9ffa4", "1c0dff",
],
}
# Load the MODIS land cover data.
landcover = ee.Image("MODIS/006/MCD12Q1/2013_01_01").select("LC_Type1")
# Add the land cover layer to the map with the visualization parameters.
Map.addLayer(landcover, igbpLandCoverVis, "MODIS Land Cover")
# Add a legend to the map for the IGBP land cover classification.
Map.add_legend(builtin_legend="MODIS/006/MCD12Q1")
# Display the map.
# Display the map
display(Map)
1.4.5. Measurement Levels#
Attributes in GIS are categorized into four measurement levels, each with distinct characteristics:
Nominal Data: These are categorical data without any numeric significance or order. For example, land use types such as residential, commercial, and industrial are nominal data.
Ordinal Data: This data type has a ranked order but no fixed interval between ranks. A soil erosion risk map might classify areas as low, moderate, or high risk, which are ordinal data.
Interval Data: Numeric data with equal intervals but no true zero point. Temperature scales like Celsius and Fahrenheit are interval data because the difference between degrees is the same, but there is no absolute zero.
Ratio Data: Similar to interval data but with a meaningful zero point, allowing for the comparison of relative magnitudes. Examples include population counts and annual rainfall measurements, where zero represents none or no occurrence.